Outagamie County
On the Implicit Adversariality of Catastrophic Forgetting in Deep Continual Learning
Peng, Ze, Zhang, Jian, Guo, Jintao, Qi, Lei, Gao, Yang, Shi, Yinghuan
Continual learning seeks the human-like ability to accumulate new skills in machine intelligence. Its central challenge is catastrophic forgetting, whose underlying cause has not been fully understood for deep networks. In this paper, we demystify catastrophic forgetting by revealing that the new-task training is implicitly an adversarial attack against the old-task knowledge. Specifically, the new-task gradients automatically and accurately align with the sharp directions of the old-task loss landscape, rapidly increasing the old-task loss. This adversarial alignment is intriguingly counter-intuitive because the sharp directions are too sparsely distributed to align with by chance. To understand it, we theoretically show that it arises from training's low-rank bias, which, through forward and backward propagation, confines the two directions into the same low-dimensional subspace, facilitating alignment. Gradient projection (GP) methods, a representative family of forgetting-mitigating methods, reduce adversarial alignment caused by forward propagation, but cannot address the alignment due to backward propagation. We propose backGP to address it, which reduces forgetting by 10.8% and improves accuracy by 12.7% on average over GP methods.
- North America > United States > Wisconsin > Outagamie County > Appleton (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Switzerland (0.04)
- (4 more...)
- Research Report (1.00)
- Workflow (0.67)
Does Local News Stay Local?: Online Content Shifts in Sinclair-Acquired Stations
Wanner, Miriam, Hager, Sophia, Field, Anjalie
Local news stations are often considered to be reliable sources of non-politicized information, particularly local concerns that residents care about. Because these stations are trusted news sources, viewers are particularly susceptible to the information they report. The Sinclair Broadcast group is a broadcasting company that has acquired many local news stations in the last decade. We investigate the effects of local news stations being acquired by Sinclair: how does coverage change? We use computational methods to investigate changes in internet content put out by local news stations before and after being acquired by Sinclair and in comparison to national news outlets. We find that there is clear evidence that local news stations report more frequently on national news at the expense of local topics, and that their coverage of polarizing national topics increases.
- North America > United States > Montana > Missoula County > Missoula (0.28)
- North America > United States > Rhode Island > Providence County > Providence (0.28)
- Asia > Middle East > Israel (0.14)
- (46 more...)
- Media > News (1.00)
- Leisure & Entertainment > Sports > Football (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.92)
Body-terrain interaction affects large bump traversal of insects and legged robots
Sm all animals and robots must often rapidly traverse large bump - like obstacles when moving through complex 3 - D terrains, during which, in addition to leg - ground contact, their body inevitably come s into physical contact with the obstacl es. However, we know little about the performance limits of large bump traversal and how body - terrain interaction affects traversal . To address these, we challenged the discoid cockroach and a n open - loop six - legged robot to dynamically run into a large bump of varying height t o discover the maximal traversal performance, and studied how locomotor modes and traversal performance are affected by body - terrain interaction . Remarkably, d uring rapid running, both t he animal and the robot were cap able of dynamically traversing a bump much higher than its hip height ( up to 4 times the hip height for the animal and 3 times for the robot, respectively) at traversal speeds typical of running, with decreasing traversal probability with increasing bump height. A stability analysis using a novel locomotion energy landscape model explained why traversal was more likely when the animal or robot approach ed the bump with a low initial body yaw and a high initial body pitch, and why deflection was more likely otherwise . Inspired by these principl es, we demonstrated a novel control strategy of active body pitch ing that increase d the robot's maximal traversable bump height by 75%. Our study is a major step in Bioinspiration & Biomimetics (2018), 13, 02600 5; htt ps://li.me.jhu.edu 2 establishing the framework of locomotion energy landscapes to understand locomotion in complex 3 - D terrains .
- North America > United States > Maryland > Baltimore (0.14)
- North America > United States > Wisconsin > Outagamie County > Appleton (0.04)
- North America > United States > Illinois > Lake County > Waukegan (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.69)
Meta-Learning in Self-Play Regret Minimization
Sychrovský, David, Schmid, Martin, Šustr, Michal, Bowling, Michael
Regret minimization is a general approach to online optimization which plays a crucial role in many algorithms for approximating Nash equilibria in two-player zero-sum games. The literature mainly focuses on solving individual games in isolation. However, in practice, players often encounter a distribution of similar but distinct games. For example, when trading correlated assets on the stock market, or when refining the strategy in subgames of a much larger game. Recently, offline meta-learning was used to accelerate one-sided equilibrium finding on such distributions. We build upon this, extending the framework to the more challenging self-play setting, which is the basis for most state-of-the-art equilibrium approximation algorithms for domains at scale. When selecting the strategy, our method uniquely integrates information across all decision states, promoting global communication as opposed to the traditional local regret decomposition. Empirical evaluation on normal-form games and river poker subgames shows our meta-learned algorithms considerably outperform other state-of-the-art regret minimization algorithms.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Texas (0.05)
- North America > United States > District of Columbia > Washington (0.04)
- (12 more...)
Achieving the Safety and Security of the End-to-End AV Pipeline
Curran, Noah T., Cho, Minkyoung, Feng, Ryan, Liu, Liangkai, Tang, Brian Jay, MohajerAnsari, Pedram, Domeke, Alkim, Pesé, Mert D., Shin, Kang G.
In the current landscape of autonomous vehicle (AV) safety and security research, there are multiple isolated problems being tackled by the community at large. Due to the lack of common evaluation criteria, several important research questions are at odds with one another. For instance, while much research has been conducted on physical attacks deceiving AV perception systems, there is often inadequate investigations on working defenses and on the downstream effects of safe vehicle control. This paper provides a thorough description of the current state of AV safety and security research. We provide individual sections for the primary research questions that concern this research area, including AV surveillance, sensor system reliability, security of the AV stack, algorithmic robustness, and safe environment interaction. We wrap up the paper with a discussion of the issues that concern the interactions of these separate problems. At the conclusion of each section, we propose future research questions that still lack conclusive answers. This position article will serve as an entry point to novice and veteran researchers seeking to partake in this research domain.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > New York > New York County > New York City (0.08)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.05)
- (10 more...)
- Transportation > Ground > Road (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- (3 more...)
Partial-differential-algebraic equations of nonlinear dynamics by Physics-Informed Neural-Network: (I) Operator splitting and framework assessment
Vu-Quoc, Loc, Humer, Alexander
Several forms for constructing novel physics-informed neural-networks (PINN) for the solution of partial-differential-algebraic equations based on derivative operator splitting are proposed, using the nonlinear Kirchhoff rod as a prototype for demonstration. The open-source DeepXDE is likely the most well documented framework with many examples. Yet, we encountered some pathological problems and proposed novel methods to resolve them. Among these novel methods are the PDE forms, which evolve from the lower-level form with fewer unknown dependent variables to higher-level form with more dependent variables, in addition to those from lower-level forms. Traditionally, the highest-level form, the balance-of-momenta form, is the starting point for (hand) deriving the lowest-level form through a tedious (and error prone) process of successive substitutions. The next step in a finite element method is to discretize the lowest-level form upon forming a weak form and linearization with appropriate interpolation functions, followed by their implementation in a code and testing. The time-consuming tedium in all of these steps could be bypassed by applying the proposed novel PINN directly to the highest-level form. We developed a script based on JAX. While our JAX script did not show the pathological problems of DDE-T (DDE with TensorFlow backend), it is slower than DDE-T. That DDE-T itself being more efficient in higher-level form than in lower-level form makes working directly with higher-level form even more attractive in addition to the advantages mentioned further above. Since coming up with an appropriate learning-rate schedule for a good solution is more art than science, we systematically codified in detail our experience running optimization through a normalization/standardization of the network-training process so readers can reproduce our results.
- Europe > Norway > Eastern Norway > Oslo (0.04)
- North America > United States > Wisconsin > Outagamie County > Appleton (0.04)
- North America > United States > New York (0.04)
- (5 more...)
Neuron-centric Hebbian Learning
Ferigo, Andrea, Cunegatti, Elia, Iacca, Giovanni
One of the most striking capabilities behind the learning mechanisms of the brain is the adaptation, through structural and functional plasticity, of its synapses. While synapses have the fundamental role of transmitting information across the brain, several studies show that it is the neuron activations that produce changes on synapses. Yet, most plasticity models devised for artificial Neural Networks (NNs), e.g., the ABCD rule, focus on synapses, rather than neurons, therefore optimizing synaptic-specific Hebbian parameters. This approach, however, increases the complexity of the optimization process since each synapse is associated to multiple Hebbian parameters. To overcome this limitation, we propose a novel plasticity model, called Neuron-centric Hebbian Learning (NcHL), where optimization focuses on neuron- rather than synaptic-specific Hebbian parameters. Compared to the ABCD rule, NcHL reduces the parameters from $5W$ to $5N$, being $W$ and $N$ the number of weights and neurons, and usually $N \ll W$. We also devise a ``weightless'' NcHL model, which requires less memory by approximating the weights based on a record of neuron activations. Our experiments on two robotic locomotion tasks reveal that NcHL performs comparably to the ABCD rule, despite using up to $\sim97$ times less parameters, thus allowing for scalable plasticity
- Oceania > Australia > Victoria > Melbourne (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
- (7 more...)
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Song, Kaiqiang, Wang, Xiaoyang, Cho, Sangwoo, Pan, Xiaoman, Yu, Dong
This paper introduces a novel approach to enhance the capabilities of Large Language Models (LLMs) in processing and understanding extensive text sequences, a critical aspect in applications requiring deep comprehension and synthesis of large volumes of information. Recognizing the inherent challenges in extending the context window for LLMs, primarily built on Transformer architecture, we propose a new model architecture, referred to as Zebra. This architecture efficiently manages the quadratic time and memory complexity issues associated with full attention in the Transformer by employing grouped local-global attention layers. Our model, akin to a zebra's alternating stripes, balances local and global attention layers, significantly reducing computational requirements and memory consumption. Comprehensive experiments, including pretraining from scratch, continuation of long context adaptation training, and long instruction tuning, are conducted to evaluate the Zebra's performance. The results show that Zebra achieves comparable or superior performance on both short and long sequence benchmarks, while also enhancing training and inference efficiency.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Africa > Middle East > Egypt (0.14)
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
- (33 more...)
- Education (1.00)
- Leisure & Entertainment (0.92)
- Media > Music (0.46)
A Privacy Preserving System for Movie Recommendations Using Federated Learning
Neumann, David, Lutz, Andreas, Müller, Karsten, Samek, Wojciech
Recommender systems have become ubiquitous in the past years. They solve the tyranny of choice problem faced by many users, and are utilized by many online businesses to drive engagement and sales. Besides other criticisms, like creating filter bubbles within social networks, recommender systems are often reproved for collecting considerable amounts of personal data. However, to personalize recommendations, personal information is fundamentally required. A recent distributed learning scheme called federated learning has made it possible to learn from personal user data without its central collection. Consequently, we present a recommender system for movie recommendations, which provides privacy and thus trustworthiness on multiple levels: First and foremost, it is trained using federated learning and thus, by its very nature, privacy-preserving, while still enabling users to benefit from global insights. Furthermore, a novel federated learning scheme, called FedQ, is employed, which not only addresses the problem of non-i.i.d.-ness and small local datasets, but also prevents input data reconstruction attacks by aggregating client updates early. Finally, to reduce the communication overhead, compression is applied, which significantly compresses the exchanged neural network parametrizations to a fraction of their original size. We conjecture that this may also improve data privacy through its lossy quantization stage.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (35 more...)
Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding
Suzgun, Mirac, Melas-Kyriazi, Luke, Jurafsky, Dan
In open-ended natural-language generation, existing text decoding methods typically struggle to produce text which is both diverse and high-quality. Greedy and beam search are known to suffer from text degeneration and linguistic diversity issues, while temperature, top-k, and nucleus sampling often yield diverse but low-quality outputs. In this work, we present crowd sampling, a family of decoding methods based on Bayesian risk minimization, to address this diversity-quality trade-off. Inspired by the principle of "the wisdom of the crowd," crowd sampling seeks to select a candidate from a pool of candidates that has the least expected risk (i.e., highest expected reward) under a generative model according to a given utility function. Crowd sampling can be seen as a generalization of numerous existing methods, including majority voting, and in practice, it can be used as a drop-in replacement for existing sampling methods. Extensive experiments show that crowd sampling delivers improvements of 3-7 ROUGE and BLEU points across a wide range of tasks, including summarization, data-to-text, translation, and textual style transfer, while achieving new state-of-the-art results on WebNLG and WMT'16.
- North America > United States > Wisconsin > Outagamie County > Appleton (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Nepal > Bagmati Province > Kathmandu District > Kathmandu (0.04)
- (35 more...)
- Research Report > New Finding (0.68)
- Personal > Obituary (0.46)